Detecting and responding to sounds for autonomous vehicles

ABSTRACT

The technology relates to detecting and responding to sounds for a vehicle having an autonomous driving mode. In one example, an audible signal corresponding to a sound received at one or more microphones of the vehicle may be received. Sensor data generated by a perception system of the vehicle identifying objects in an environment of the vehicle may be received. A type of sound may be determined by inputting the audible signal into a classifier. A set of additional signals may be determined based on the determined type of sound. The sensor data may be processed in order to identify one or more additional signals of the identified set of additional signals. The vehicle may be controlled in the autonomous driving mode in order to respond to the sound based on the one or more additional signals and the type of sound.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. patent applicationSer. No. 16/108,411, filed Aug. 22, 2018, the entire disclosure of whichis incorporated herein by reference.

BACKGROUND

Autonomous vehicles, such as vehicles that do not require a humandriver, can be used to aid in the transport of passengers or items fromone location to another. Such vehicles may operate in a fully autonomousmode where passengers may provide some initial input, such as a pickupor destination location, and the vehicle maneuvers itself to thatlocation. In order to do so safely, these vehicles must be able todetect and identify objects in the environment as well as respond tothem quickly. Typically, these objects are identified from informationthat can be perceived by sensors such as LIDAR, radar, or cameras.

In some instances, sound can be a critically important signal todetermining how the vehicle should respond to its environment. Forinstance, railroad crossing bells, train whistles, beeping soundsemanating from reversing trucks, crosswalk chirping sounds, etc. can allprovide important contextual cues to human drivers about what may behappening, and in many cases, before a human driver is able to visuallyperceive the situation. Therefore being able to detect and respond tosuch processions can be especially important to ensuring a safe andeffective autonomous driving.

BRIEF SUMMARY

One aspect of the disclosure provides a method of detecting andresponding to sounds for a vehicle having an autonomous driving mode.The method includes receiving an audible signal corresponding to a soundreceived at one or more microphones of the vehicle; receiving sensordata generated by a perception system of the vehicle identifying objectsin an environment of the vehicle, the perception system including one ormore sensors; determining a type of sound by inputting the audiblesignal into a classifier; identifying a set of additional signals basedon the determined type of sound; processing the sensor data in order toidentify one or more additional signals of the set of additionalsignals; and controlling the vehicle in the autonomous driving mode inorder to respond to the sound based on the one or more additionalsignals and the type of sound.

In one example, the method also includes in response to inputting theaudible signal into the classifier, receiving a likelihood value for thetype of sound, and determining that the likelihood value meets athreshold, wherein identifying the one or more additional signals isperformed when the likelihood value is determined to meet the threshold.In this example, the method also includes increasing the likelihoodvalue based on the identified one or more additional signals, andwherein controlling the vehicle is further based on the increasedlikelihood value. In addition, the method also includes, determiningthat the increased likelihood value meets a second threshold likelihoodvalue, and wherein controlling the vehicle is further based on thedetermination that the increased likelihood value meets the secondthreshold likelihood value. In another example, the method also includesdetermining that the one or more additional signals are a predeterminedcombination of additional signals, and wherein controlling the vehicleis further based on the determination that the one or more additionalsignals are a predetermined combination of additional signals. Inanother example, the method also includes training the classifier usingexamples of sounds relevant to driving decisions. In this example,sounds relevant to driving include sounds which would likely cause achange in behavior of the vehicle. In another example, the type of soundis a train whistle and the one or more additional signals includes oneor more of a flashing light, a gate, a train, a train station identifiedin pre-stored map information of the vehicle within a predetermineddistance of a current location of the vehicle, or a railroad crossingidentified in pre-stored map information of the vehicle within apredetermined distance of a current location of the vehicle. In anotherexample, the type of sound is a reverse beeping sound, and the one ormore additional signals includes one or more of a vehicle of a givensize or flashing lights. In another example, the type of sound is acrosswalk chirp, and the one or more additional signals includes one ormore of a crosswalk, a walk sign, or flashing lights. In anotherexample, the method also includes prior to identifying the one or moreadditional signals, controlling the vehicle in the autonomous drivingmode in order to perform an initial response based the type of sound.

Another aspect of the disclosure provides a system for detecting andresponding to sounds for a vehicle having an autonomous driving mode.The system includes one or more processors configured to receive anaudible signal corresponding to a sound received at one or moremicrophones of the vehicle; receive sensor data generated by aperception system of the vehicle identifying objects in an environmentof the vehicle, the perception system including one or more sensors;determine a type of sound by inputting the audible signal into aclassifier; identify a set of additional signals based on the determinedtype of sound; process the sensor data in order to identify one or moreadditional signals of the set of additional signals; and control thevehicle in the autonomous driving mode in order to respond to the soundbased on the one or more additional signals and the type of sound.

In one example, the one or more processors are further configured to inresponse to inputting the audible signal into the classifier, receive alikelihood value for the type of sound and determine that the likelihoodvalue meets a threshold, wherein identifying the one or more additionalsignals is performed when the likelihood value is determined to meet thethreshold. In this example, the one or more processors are furtherconfigured to increase the likelihood value based on the identified oneor more additional signals, and to control the vehicle is further basedon the increased likelihood value. In addition, the one or moreprocessors are further configured to determine that the increasedlikelihood value meets a second threshold likelihood value, and tocontrol the vehicle is further based on the determination that theincreased likelihood value meets the second threshold likelihood value.In addition or alternatively, the one or more processors are furtherconfigured to determine that the one or more additional signals are apredetermined combination of additional signals, and wherein controllingthe vehicle is further based on the determination that the one or moreadditional signals are a predetermined combination of additionalsignals. In another example, the type of sound is a railroad warningbell and the one or more additional signals includes one or more of aflashing light, a gate, a train, a train station identified inpre-stored map information of the vehicle within a predetermineddistance of a current location of the vehicle, or a railroad crossingidentified in pre-stored map information of the vehicle within apredetermined distance of a current location of the vehicle. In anotherexample, the type of sound is a reverse beeping sound, and the one ormore additional signals includes one or more of a vehicle of a givensize or flashing lights. In another example, the type of sound is acrosswalk chip, and the one or more additional signals includes one ormore of a crosswalk, a walk sign, or flashing lights. In anotherexample, the system also includes the vehicle, the perception system,and the one or more microphones.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a functional diagram of an example vehicle in accordance withaspects of the disclosure.

FIG. 2 is an example representation of map information in accordancewith aspects of the disclosure.

FIG. 3 is an example external view of a vehicle in accordance withaspects of the disclosure.

FIG. 4 is an example view of a section of roadway corresponding to themap information of FIG. 2 in accordance with aspects of the disclosure.

FIG. 5 is a flow diagram in accordance with aspects of the disclosure.

DETAILED DESCRIPTION Overview

The technology relates to detecting and responding to sounds forautonomous vehicles or vehicles operating in an autonomous driving mode.In order to do so, in addition to the perception system which useslasers, radar, sonar, cameras or other sensors to detect objects in theenvironment, the autonomous vehicle may be equipped with a series ofmicrophones or microphone arrays arranged at different locations on thevehicle. The perception system may provide the vehicle's computingdevices with sensor data including processed and “raw” data from thevarious sensors.

Once received by the vehicle's computing devices, the output of themicrophones may be input into a classifier. In order to train theclassifier, examples of sounds relevant to driving decisions may becollected and labeled, for instance by a human operator or otherwise,and used to train the classifier. Once trained, the classifier may beused to identify types of sounds received at the microphones as well asa confidence or likelihood value for each type of sound.

Each type of sound may be associated with a set of additional signals orinformation. Once a particular type of sound is identified by theclassifier at a first likelihood threshold value, the vehicle'scomputing devices may begin to analyze sensor data from the perceptionsystem in order to identify one or more additional signals of the set ofadditional signals associated with the particular type of sound. Byusing the first likelihood threshold value, this may avoid the vehicle'scomputing devices from attempting to search for additional signals whichare very unlikely to actually be occurring and which would be a waste ofcomputing resources.

Each additional signal of the identified set of additional signals thatis identified may be used to increase the likelihood value that theidentified type of sound identified by is in fact a real sound. Inaddition, these additional signals may be used to identify what objectis actually making the sound. Once a second likelihood threshold valueis met, the vehicle's computing devices may actively control the vehiclein order to control the vehicle to respond to that sound. The secondlikelihood threshold value may be greater than the first likelihoodthreshold value, the identification of at least one additional signal,and/or the identification of a specific or predetermined combination ofadditional signals. Responding may include responding to an objectidentified as emanating the sound by controlling the vehicle in anautonomous driving mode in order to yield to that object or simplydriving mode cautiously.

The features described herein allow a vehicle driving in an autonomousmode to automatically detect and respond to sounds. Not only does thisallow the vehicle to react to situations when objects relevant to suchsituations are occluded and even before such situations would be“visible” to other sensors such as LIDAR and cameras, but by doing so,it also allows the vehicle more time to respond to such situations. Thisin turn may make the vehicle significantly safer on the roads.

Example Systems

As shown in FIG. 1, a vehicle 100 in accordance with one aspect of thedisclosure includes various components. While certain aspects of thedisclosure are particularly useful in connection with specific types ofvehicles, the vehicle may be any type of vehicle including, but notlimited to, cars, trucks, motorcycles, busses, recreational vehicles,etc. The vehicle may have one or more computing devices, such ascomputing devices 110 containing one or more processors 120, memory 130and other components typically present in general purpose computingdevices.

The memory 130 stores information accessible by the one or moreprocessors 120, including instructions 132 and data 134 that may beexecuted or otherwise used by the processor 120. The memory 130 may beof any type capable of storing information accessible by the processor,including a computing device-readable medium, or other medium thatstores data that may be read with the aid of an electronic device, suchas a hard-drive, memory card, ROM, RAM, DVD or other optical disks, aswell as other write-capable and read-only memories. Systems and methodsmay include different combinations of the foregoing, whereby differentportions of the instructions and data are stored on different types ofmedia.

The instructions 132 may be any set of instructions to be executeddirectly (such as machine code) or indirectly (such as scripts) by theprocessor. For example, the instructions may be stored as computingdevice code on the computing device-readable medium. In that regard, theterms “instructions” and “programs” may be used interchangeably herein.The instructions may be stored in object code format for directprocessing by the processor, or in any other computing device languageincluding scripts or collections of independent source code modules thatare interpreted on demand or compiled in advance. Functions, methods androutines of the instructions are explained in more detail below.

The data 134 may be retrieved, stored or modified by processor 120 inaccordance with the instructions 132. As an example, data 134 of memory130 may store predefined scenarios. A given scenario may identify a setof scenario requirements including a type of object, a range oflocations of the object relative to the vehicle, as well as otherfactors such as whether the autonomous vehicle is able to maneuveraround the object, whether the object is using a turn signal, thecondition of a traffic light relevant to the current location of theobject, whether the object is approaching a stop sign, etc. Therequirements may include discrete values, such as “right turn signal ison” or “in a right turn only lane”, or ranges of values such as “havingan heading that is oriented at an angle that is 30 to 60 degrees offsetfrom a current path of vehicle 100.” In some examples, the predeterminedscenarios may include similar information for multiple objects.

The one or more processor 120 may be any conventional processors, suchas commercially available CPUs. Alternatively, the one or moreprocessors may be a dedicated device such as an ASIC or otherhardware-based processor. Although FIG. 1 functionally illustrates theprocessor, memory, and other elements of computing devices 110 as beingwithin the same block, it will be understood by those of ordinary skillin the art that the processor, computing device, or memory may actuallyinclude multiple processors, computing devices, or memories that may ormay not be stored within the same physical housing. Similarly, thememory may be a hard drive or other storage media located in a housingdifferent from that of computing devices 110. Accordingly, references toa processor or computing device will be understood to include referencesto a collection of processors or computing devices or memories that mayor may not operate in parallel.

Computing devices 110 may all of the components normally used inconnection with a computing device such as the processor and memorydescribed above as well as a user input 150 (e.g., a mouse, keyboard,touch screen and/or microphone) and various electronic displays (e.g., amonitor having a screen or any other electrical device that is operableto display information). In this example, the vehicle includes a seriesof microphones 152 or microphone arrays arranged at different locationson the vehicle. As shown, microphone arrays are depicted as separatefrom the perception system 172 and incorporated into the computingsystem 110. However all or some of microphones 152 may be incorporatedinto the perception system 172 or may be configured as a separatesystem. In this regard, the microphones may be considered independentcomputing devices operated via a microcontroller which sends signals tothe computing devices 110.

In one example, computing devices 110 may be an autonomous drivingcomputing system incorporated into vehicle 100. The autonomous drivingcomputing system may capable of communicating with various components ofthe vehicle. For example, returning to FIG. 1, computing devices 110 maybe in communication with various systems of vehicle 100, such asdeceleration system 160 (for controlling braking of the vehicle),acceleration system 162 (for controlling acceleration of the vehicle),steering system 164 (for controlling the orientation of the wheels anddirection of the vehicle), signaling system 166 (for controlling turnsignals), navigation system 168 (for navigating the vehicle to alocation or around objects), positioning system 170 (for determining theposition of the vehicle), perception system 172 (for detecting objectsin the vehicle's environment), and power system 174 (for example, abattery and/or gas or diesel powered engine) in order to control themovement, speed, etc. of vehicle 100 in accordance with the instructions132 of memory 130 in an autonomous driving mode which does not requireor need continuous or periodic input from a passenger of the vehicle.Again, although these systems are shown as external to computing devices110, in actuality, these systems may also be incorporated into computingdevices 110, again as an autonomous driving computing system forcontrolling vehicle 100.

The computing devices 110 may control the direction and speed of thevehicle by controlling various components. By way of example, computingdevices 110 may navigate the vehicle to a destination locationcompletely autonomously using data from the map information andnavigation system 168. Computing devices 110 may use the positioningsystem 170 to determine the vehicle's location and perception system 172to detect and respond to objects when needed to reach the locationsafely. In order to do so, computing devices 110 may cause the vehicleto accelerate (e.g., by increasing fuel or other energy provided to theengine by acceleration system 162), decelerate (e.g., by decreasing thefuel supplied to the engine, changing gears, and/or by applying brakesby deceleration system 160), change direction (e.g., by turning thefront or rear wheels of vehicle 100 by steering system 164), and signalsuch changes (e.g., by lighting turn signals of signaling system 166).Thus, the acceleration system 162 and deceleration system 160 may be apart of a drivetrain that includes various components between an engineof the vehicle and the wheels of the vehicle. Again, by controllingthese systems, computing devices 110 may also control the drivetrain ofthe vehicle in order to maneuver the vehicle autonomously.

As an example, computing devices 110 may interact with decelerationsystem 160 and acceleration system 162 in order to control the speed ofthe vehicle. Similarly, steering system 164 may be used by computingdevices 110 in order to control the direction of vehicle 100. Forexample, if vehicle 100 configured for use on a road, such as a car ortruck, the steering system may include components to control the angleof wheels to turn the vehicle. Signaling system 166 may be used bycomputing devices 110 in order to signal the vehicle's intent to otherdrivers or vehicles, for example, by lighting turn signals or brakelights when needed.

Navigation system 168 may be used by computing devices 110 in order todetermine and follow a route to a location. In this regard, thenavigation system 168 and/or data 134 may store map information, e.g.,highly detailed maps that computing devices 110 can use to navigate orcontrol the vehicle. As an example, these maps may identify the shapeand elevation of roadways, lane markers, intersections, crosswalks,speed limits, traffic signal lights, buildings, signs, real time trafficinformation, vegetation, or other such objects and information. The lanemarkers may include features such as solid or broken double or singlelane lines, solid or broken lane lines, reflectors, etc. A given lanemay be associated with left and right lane lines or other lane markersthat define the boundary of the lane. Thus, most lanes may be bounded bya left edge of one lane line and a right edge of another lane line.

FIG. 2 is an example of map information 200 for a section of roadway.

The map information 200 includes information identifying the shape,location, and other characteristics of various road features proximateto intersection 202 and railroad crossing 204. In this example, the mapinformation 200 information defining the shape and location of lanemarkers 210-214, railroad crossing gates 220, 222, crosswalks 230, 232,sidewalk 240, stop signs 250, 252, as well as the shape and direction oftraffic for lanes 260, 262, etc. Although the example of map information200 includes only a few road features, for instance, lane lines,shoulder areas, an intersection, and lanes and orientations, mapinformation 200 may also identify various other road features such astraffic signal lights, crosswalks, sidewalks, stop signs, yield signs,speed limit signs, road signs, speed bumps, etc. Although not shown, themap information may also include information identifying speed limitsand other legal traffic requirements, such as which vehicle has theright of way given the location of stop signs or state of trafficsignals, etc.

Although the detailed map information is depicted herein as animage-based map, the map information need not be entirely image based(for example, raster). For example, the detailed map information mayinclude one or more roadgraphs or graph networks of information such asroads, lanes, intersections, and the connections between these features.Each feature may be stored as graph data and may be associated withinformation such as a geographic location and whether or not it islinked to other related features, for example, a stop sign may be linkedto a road and an intersection, etc. In some examples, the associateddata may include grid-based indices of a roadgraph to allow forefficient lookup of certain roadgraph features.

The perception system 172 also includes one or more components fordetecting objects external to the vehicle such as other vehicles,obstacles in the roadway, traffic light signals, signs, trees, etc. Forexample, the perception system 172 may include one or more LIDARsensors, sonar devices, radar units, cameras and/or any other detectiondevices that record sensor data which may be processed by computingdevices 110. The sensors of the perception system may detect objects andtheir characteristics such as location, orientation, size, shape, type(for instance, vehicle, person or pedestrian, bicyclist, etc.), heading,and speed of movement, etc. The raw data from the sensors and/or theaforementioned characteristics can be quantified or arranged into adescriptive function, vector, and or bounding box and sent as sensordata for further processing to the computing devices 110 periodicallyand continuously as it is generated by the perception system 172. Asdiscussed in further detail below, computing devices 110 may use thepositioning system 170 to determine the vehicle's location andperception system 172 to detect and respond to objects when needed toreach the location safely.

For instance, FIG. 3 is an example external view of vehicle 100. In thisexample, roof-top housing 310 and dome housing 312 may include a LIDARsensor as well as various cameras and radar units. In addition, housing320 located at the front end of vehicle 100 and housings 330, 332 on thedriver's and passenger's sides of the vehicle may each store a LIDARsensor. For example, housing 330 is located in front of driver door 360.Vehicle 100 also includes housings 340, 342 for radar units and/orcameras also located on the roof of vehicle 100. Additional radar unitsand cameras (not shown) may be located at the front and rear ends ofvehicle 100 and/or on other positions along the roof or roof-top housing310.

FIG. 3 also include microphones 152 (or 152 a-152 d) arranged atdifferent locations on the vehicle. These microphones may be considered“passive microphones” in that the microphones do not need to include anemitter (such as those used in sonar devices). Each microphone may be asingle microphone or part of a larger microphone array. In that regard,as noted above, microphones 152 (including 152 a-152 d) may actuallyinclude one or more microphones or microphone arrays. However, becausemicrophones are directional, in other words an array on the front end ofa vehicle does not hear sounds behind the vehicle well, more than oneset of microphones or array may be used. In this regard a second set ofmicrophones 152 b may be located at the rear of the vehicle 100.Additional microphone arrays, such as microphones 152 c and 152 d,oriented away from the sides of the vehicle (left and right or “driver”and “passenger” sides of vehicle 100) may also be used.

Although not shown in the FIGURES, in addition or alternatively,microphone arrays may be placed microphones around a roof panel of avehicle, such as around the circumference of the housing 312 (depictedhere as a dome). This may achieve both goals (arrays of closely spacedmicrophones oriented towards different directions relative to thevehicle) simultaneously, but the microphone arrays would have to beplaced in order to limit occlusion of sensors within the dome.

The memory 130 may store various software modules and models. Thesemodels include learned models, for instance, those that utilize machinelearning, such as classifiers. At first of these models may include aclassifier. The classifier, once trained, may be used to identify typesof sounds received at the microphones as well as a confidence orlikelihood value for each type of sound being a real sound. In order totrain the classifier, examples of sounds relevant to driving decisionsmay be collected and labeled, for instance by a human operator orotherwise, and used to train the classifier. For instance, soundsrelevant to driving may include those which would likely cause a changein behavior of the vehicle (i.e. stopping, yielding, driving moreslowly, etc.). Example types of such sounds may include train whistles,train bells, railroad warning whistles, railroad warning bells, othersounds related to trains, reverse beeping sounds for trucks, crosswalkchips, vehicle honking noises, vehicle engine revving noises, tirescreeching noises, crowd noises, etc. These sounds need not, but may,include honking and/or emergency sirens. The length of the examples usedto train the model may be fairly short, for instance at least as long asthe labeled sound or even longer for more persistent sounds.

The memory 130 may also store associations. Each type of sound may beassociated with a set of additional signals or information. Forinstance, a type of sound corresponding to a railroad crossing bell, atrain whistle, or other sounds related to trains may be associated witha set of additional signals including as flashing lights, a gate, atrain, a nearby train station or railroad crossing in the vehicle'spre-stored map information, etc. As another example, a type of soundcorresponding to a reverse beeping sound may be associated with a set ofadditional signals including a large vehicle (i.e. a truck or bus),flashing lights, etc. As another example, a type of sound correspondingto a crosswalk chirp may be associated with a set of additional signalsincluding nearby pedestrians, a crosswalk, a walk sign, flashing orother lights, etc. As another example, a type of sound corresponding tocrowd noises may be associated with a set of additional signalsincluding a large group of people. As another example, a type of soundcorresponding to tires screeching or a motorcycle engine revving may beassociated with a set of additional signals including a motorcycle.These associations may be stored in any number of ways including, forinstance, a table, a database, etc.

The memory 130 may also include a second model that can be used toestimate a bearing of a siren noise. For instance, the timing of thesiren noise reaching each of the microphones may be measured to providemeasurements as to a likely bearing, or relative direction, of thesource of the siren, or rather a probability distribution over possiblebearings.

The memory 130 may also include a third model. This third model may usethe microphone output, previously determined to include siren noise bythe first model, as well as timing and amplitudes of the siren noise asinput. With regard to the amplitudes, the presence and intensity ofhigher-frequency harmonics of a siren may also provide some indicationof range, since the frequencies drop off at different rates. In someexamples, the model may also use the estimated hearing and estimatedrange as input. In this regard, the third model may include a modelwhich uses all or some the aforementioned inputs to provide aprobability distribution over possible ranges (distances) of the sourceof the siren.

The memory 130 may also include a fourth model. This fourth model mayuse the siren noise and timing collected over time to estimate aprobability distribution over possible relative velocities of the sourceof the siren noise. For instance, using the change in bearing over timemay provide an estimate of the relative velocity. In addition oralternatively, the model may include a neural net trained to predictlikelihood over relative velocities from a snippet of the siren sound.This snipped may be any amount of time, such as 0.5 second, 1.0 second,2.0 seconds, 3.0 seconds, 4.0 seconds or more or less. The net may beable to extract relative velocity from the change in amplitude as wellas changes in the harmonics, and in some cases, from Doppler shifts ofthe siren frequencies.

The examples described herein utilize separate models, however, themodels may be implemented as a single classifier to detect a type ofsound, estimate a bearing, estimate a range, and estimate a relativevelocity. In addition, one or more of the models described above mayinclude learned models, for instance, those that utilize machinelearning, such as classifiers, one or more neural nets, or trackers,such as a Kalman filter or those that take in estimated bearings andestimated ranges, and/or corresponding probability distributions, overtime, and output other state estimates, such as estimated relativevelocities. In still other examples, estimated bearings may bedetermined using various algorithms such as a generalized crosscorrelation phase transform. In another example, estimated range may becomputed analytically from the amplitude of the pressure sensed by themicrophones because using the knowledge a range of siren volumes at afixed distance and that pressure falls off at a rate of 1/range.

Example Methods

In addition to the operations described above and illustrated in thefigures, various operations will now be described. It should beunderstood that the following operations do not have to be performed inthe precise order described below. Rather, various steps can be handledin a different order or simultaneously, and steps may also be added oromitted.

As noted above, computing devices 110 may control the movement ofvehicle 100 though its environment. FIG. 4 is an example of vehicle 100maneuvering on a portion of roadway 400 corresponding to the area of mapinformation 200. In this regard, intersection 402 corresponds tointersection 202, railroad crossing 404 corresponds to railroad crossing204, lane markers 410-414 correspond to lane markers 210-214, railroadcrossing gates 420, 422 correspond to railroad crossing gates 220, 222,crosswalks 430, 432 correspond to crosswalks 230, 232, sidewalk 440corresponds to sidewalk 240, stop signs 450, 452 correspond to stopsigns 250, 252, lanes 460, 462, correspond to lanes 260, 262, etc. Thus,in this example, vehicle 100 is approaching intersection 402 from lane460 following a trajectory 470 that will take vehicle 100 intointersection 402 to proceed towards lane 462 by making a left turn atintersection 402.

In this example, the vehicle's perception system 172 may provide thecomputing devices 110 with information about the vehicle's environment.The perception system 172 may provide the vehicle's computing devices110 with sensor data including processed and “raw” data from the varioussensors. Thus, the sensor data may include the location of objects suchas lane markers 410, railroad gate 420, crosswalks 430, 432, stop signs450, 452, and so on, as well as other road users such as vehicle 480.The characteristics of these different objects, including their shape,location, heading, velocity, etc. may be provided by the perceptionsystem 172 to the computing devices 110.

Once received by the vehicle's computing devices 110, the output of themicrophones may be input into the aforementioned classifier of memory130. In some instances, all raw audio signals from the microphones ofthe vehicle may be input into the classifier. The length of the audiosignals analyzed may be very short or very long. Of course, differenttypes of sounds may be best served by different input lengths, so thelength could be optimized across all types of sounds that the classifieris able to identify or a different input length for each class (ifasking in a binary fashion if a sound is present). As indicated above,the output of the classifier may identify types of sounds received atthe microphones as well as a confidence or likelihood value for eachidentified type of sound.

The likelihood values of the identified types of sounds may be comparedto a first threshold likelihood value. If the first threshold likelihoodvalue is met by a given identified type of sound, the set of additionalsignals associated with the given identified type of sound may also beidentified from the aforementioned associations of the memory 130.

The vehicle's computing devices may then begin to analyze sensor data aswell as the map information in order to identify one or more additionalsignals of the identified set of additional signals. For instance, ifthe classifier identifies a railroad warning bell at a likelihood valuethat meets the first threshold likelihood value, the vehicle's computingdevices 110 may begin to actively search LIDAR sensor data and/or cameraimages for a train or flashing lights. For example, the computingdevices 110 may use the map information 200 to identify flashing lightsof the railroad crossing gate 220/620. This may be accomplished, forinstance, using various classifiers and/or image processing techniques.The vehicle's computing devices may also search the map information toidentify whether there is a nearby railroad crossing. As an example, thecomputing devices 110 may use the map information 200 to identifyrailroad crossing 204.

For another instance, if the classifier identifies a reverse beepingsound at a likelihood value that meets the first threshold likelihoodvalue, the vehicle's computing devices 110 may begin to actively searchLIDAR sensor data and/or camera images for a large vehicle. For example,the computing devices 110 may use sensor data received from theperception system 172 in order to identify vehicle 480, corresponding toa large truck. This may be accomplished, for instance, using variousclassifiers and/or image processing techniques.

For another instance, if the classifier identifies a crosswalk chirp ata likelihood value that meets the first threshold likelihood value, thevehicle's computing devices 110 may begin to actively search LIDARsensor data and/or camera images for nearby pedestrians. The vehicle'scomputing devices may also search the map information to identifywhether there is a nearby pedestrian, walk sign, flashing or otherlights. As an example, the computing devices 110 may use the mapinformation 200 to identify crosswalks 430 and 432.

Each additional signal of the identified set of additional signals thatis identified from the sensor data may be used to increase thelikelihood value. This, in turn, may indicate that that the particulartype of sound identified by the classifier is more likely to be a “real”sound. By using the first threshold, this may avoid the vehicle'scomputing devices from attempting to search for additional signals whichare very unlikely to actually be occurring and which would be a waste ofcomputing resources.

In addition, the sound, map information, as well as any identifiedadditional signals may be used to identify what object is actuallymaking the sound. For instance, the sound and any additional signals maybe used as input to the second model in order to provide measurements asto a likely bearing, or relative direction, of the source of the siren,or rather a probability distribution over possible bearings. Inaddition, the sound, amplitude, and timing may be input into the thirdmodel to provide a probability distribution over possible ranges of thesource of the siren. The fourth model may be used to estimate aprobability distribution over possible velocities of the source of thesiren noise. The information from the models may be provided to the oneor more computing devices 110 of the vehicle. These computing devices110 may use the estimated bearing, estimated range, estimated relativevelocity and additional signals to identify a specific object in thevehicle's environment which created the sound.

The likelihood values of the identified types of sounds may be comparedto a second threshold likelihood value. Once the second thresholdlikelihood value is met by a given identified type of sound, thevehicle's computing devices may actively control the vehicle in order torespond to that sound depending upon the location of the object that isactually making the sound. The second likelihood threshold value may begreater than the first likelihood threshold value, the identification ofat least one additional signal, and/or the identification of a specificcombination of additional signals.

The computing devices 110 may respond by controlling the vehicle 100 inan autonomous driving mode in order to react to the sound. For instance,responding may include responding to an object identified as emanatingthe sound by yielding to that object or simply driving mode cautiously.The actual behavior of the vehicle may be defined by the constraints ofthe vehicle's software and the various other objects in the vehicle'senvironment. In addition, the observed movements of the object creatingthe sound, as determined from the sensor data, if any, may also beconsidered when determining how best to respond to the object therebyfurther improving the usefulness of the response. Of course, if theobject making the sound is behind the vehicle and not approaching thevehicle, the vehicle's computing devices may simply ignore the sound(i.e. the object has already been passed by and is not relevant todriving decisions).

In some examples, the computing devices 110 may begin to cause thevehicle 100 react to the sound as soon as the first threshold likelihoodvalue is met. At the same time, the computing devices may continue toattempt to identify additional signals as discussed above. For instance,if a sound is identified as a train whistle by the classifier to alikelihood value that meets the first threshold, the vehicle's computingdevices may automatically slow the vehicle down as an initial responseor initial safety measure. As another instance, if a sound is identifiedas a reverse beeping sound by the classifier to a likelihood value thatmeets the first threshold, the vehicle's computing devices mayautomatically yield to all larger vehicles as an initial response. Thevehicle's reaction will be “stronger” or more confident the greater thelikelihood value. Once additional signals are identified, these may beused to determine whether to ignore the sound or to continue to respondto the sound.

FIG. 5 is a flow diagram 500 that may be performed by one or moreprocessors such as one or more processors 120 of computing devices 110in order to detect and respond to sounds for a vehicle having anautonomous driving mode. At block 510, an audible signal correspondingto a sound received at one or more microphones of the vehicle isreceived. At block 520, sensor data generated by a perception system ofthe vehicle identifying objects in an environment of the vehicle isreceived. The perception system includes one or more sensors. At block530, a type of sound is determined by inputting the audible signal intoa classifier. At block 540, a set of additional signals is identifiedbased on the determined type of sound. At block 550, the sensor data isprocessed in order to identify one or more additional signals of the setof additional signals. At block 560, the vehicle is controlled in theautonomous driving mode in order to respond to the sound based on theone or more additional signals and the type of sound.

The features described herein allow a vehicle driving in an autonomousmode to automatically detect and respond to sounds. Not only does thisallow the vehicle to react to situations when objects relevant to suchsituations are occluded and even before such situations would be“visible” to other sensors such as LIDAR and cameras, but by doing so,it also allow the vehicle more time to respond to such situations. Thisin turn may make the vehicle significantly safer on the roads.

Unless otherwise stated, the foregoing alternative examples are notmutually exclusive, but may be implemented in various combinations toachieve unique advantages. As these and other variations andcombinations of the features discussed above can be utilized withoutdeparting from the subject matter defined by the claims, the foregoingdescription of the embodiments should be taken by way of illustrationrather than by way of limitation of the subject matter defined by theclaims. In addition, the provision of the examples described herein, aswell as clauses phrased as “such as,” “including” and the like, shouldnot be interpreted as limiting the subject matter of the claims to thespecific examples; rather, the examples are intended to illustrate onlyone of many possible embodiments. Further, the same reference numbers indifferent drawings can identify the same or similar elements.

1. A method of detecting and responding to sounds for a vehicle havingan autonomous driving mode, the method comprising: receiving, by one ormore processors, an audible signal corresponding to a sound received atone or more microphones of the vehicle; determining a type of the sound;and controlling, by the one or more processors, the vehicle in theautonomous driving mode in order to: (1) initially respond to the soundbased on the type of the sound, and (2) continue to respond to the soundbased on one or more additional signals and the type of the sound,wherein the one or more additional signals are identified based on thetype of the sound.
 2. The method of claim 1, further comprising,determining that the one or more additional signals are a predeterminedcombination of additional signals, and wherein controlling the vehicleis further based on the determination that the one or more additionalsignals are the predetermined combination of additional signals.
 3. Themethod of claim 1, further comprising; receiving, by the one or moreprocessors, sensor data from one or more sensors of the vehicle, thesensor data identifying objects in an environment of the vehicle; andprocessing, by the one or more processors, the sensor data in order toidentify the one or more additional signals.
 4. The method of claim 1,wherein the type of the sound is associated with a set of additionalsignals.
 5. The method of claim 1, further comprising inputting theaudible signal into a classifier.
 6. The method of claim 1, wherein thetype of the sound is a train whistle.
 7. The method of claim 1, whereinthe type of the sound is a reverse beeping sound.
 8. The method of claim1, wherein the type of the sound is a crosswalk chirp.
 9. The method ofclaim 1, wherein the controlling further comprises automatically slowingdown the vehicle in response to identifying the type of the sound as atrain whistle.
 10. The method of claim 1, wherein the controllingfurther comprises automatically yielding to all larger vehicles inresponse to identifying the type of the sound as a a reverse beepingsound.
 11. A system for detecting and responding to sounds for a vehiclehaving an autonomous driving mode, the system comprising: one or moreprocessors configured to: receive an audible signal corresponding to asound received at one or more microphones of the vehicle; and controlthe vehicle in the autonomous driving mode in order to: (1) initiallyrespond to the sound based on a type of the sound, and (2) continue torespond to the sound based on one or more additional signals and thetype of the sound, wherein the one or more additional signals areidentified based on the type of the sound.
 12. The system of claim 11,wherein the one or more processors are configured to determine that theone or more additional signals are a predetermined combination ofadditional signals, and wherein the vehicle is controlled based on thedetermination that the one or more additional signals are thepredetermined combination of additional signals.
 13. The system of claim11, further comprising a memory including a classifier, wherein theclassifier is configured to determine the type of the sound when theaudible signal is input into the classifier.
 14. The system of claim 11,further comprising one or more sensors configured to generate sensordata identifying objects in an environment of the vehicle, wherein theone or more processors are configured to receive and process the sensordata in order to identify the one or more additional signals.
 15. Thesystem of claim 11, wherein the type of the sound is associated with aset of additional signals.
 16. The system of claim 11, wherein the typeof the sound is a train whistle.
 17. The system of claim 11, wherein thetype of the sound is a reverse beeping sound.
 18. The system of claim11, wherein the type of the sound is a crosswalk chirp.
 19. The systemof claim 11, wherein the one or more processors control the vehicle toautomatically slow the vehicle down when the type of the sound is atrain whistle.
 20. The system of claim 11, wherein the one or moreprocessors control the vehicle to automatically yield to all largervehicles when the type of the sound is a reverse beeping sound.